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Abstract 

In this work we analyze the possibihty that soUton dynamics in a simple nonlinear 
model allows functionally relevant predictions of the behaviour of DNA. This sug- 
gestion was first put forward by Salerno [Phys. Rev. A 44, 5292 (1991)] by showing 
results indicating that sine-Gordon kinks were set in motion at certain regions of a 
DNA sequence that include promoters. We revisit that system and show that the 
observed behaviour has nothing to do with promoters; on the contrary, it originates 
from the bases at the boundary, which are not part of the studied genome. We ex- 
plain this phenomenology in terms of an effective potential for the kink center. This 
is further extended to disprove recent claims that the dynamics of kinks [Lenholm 
and Hornquist, Physica D 177, 233 (2003)] or breathers [Bashford, J. Biol. Phys. 
32, 27 (2006)] has functional significance. We conclude that no such information can 
be extracted from this simple nonlinear model or its associated effective potential. 
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1 Introduction 



Nonlinear models supporting coherent excitations appear in many fields of sci- 
ence since the pioneering discoveries by Fermi, Pasta and Ulam [1] more than 
50 years ago. The success of this approach in modeling complex systems has 
encouraged its application in other fields. That is the case of biology, where 
nonlinear models were widely applied in many subjects, such as in the study 
of the DNA molecule (see, for example, [2,3,4]). To realize the relevance of 
these models it should be noticed that, nowadays, the computational cost of 
molecular dynamics for realistic models of DNA molecules with a few tens of 
base pairs allows simulation times up to tens of nanoseconds at most. Nonlin- 
ear models allow the study of such a complex system with very many degrees 
of freedom by reducing drastically this amount up to one degree of freedom 
per base pair, the most relevant for the process under study. It goes without 
saying that the reduction of a very complicated object such as the DNA duplex 
to a polymer formed by base pairs, each one with just one degree of freedom 
(sometimes a few more) helps enormously the theoretical and computational 
study of this models. Nevertheless, although simplified, these models can yield 
important results. An example of these models is the Peyrard- Bishop model of 
DNA [5], which achieved an important goal when describing the denaturation 
process of DNA in terms of just the radial distance of the bases on each base 
pair [6]. 

Among all these approaches we focus here on the work of Englander et al. 

[7], who introduced the sine-Gordon (sG) equation as a model for DNA in 
1980. The existence of sG solitons in the DNA molecule has been surrounded 
by controversy, as expected in a field were biology and physics do not always 
meet in a fruitful way [8,9]. When Englander and co- workers introduced the 
sG model of DNA, they based their hypothesis on experimental results that 
showed unexpectedly long lifetimes of open states of DNA duplexes [10]. In 
spite of the fact that, later, Gueron et al. [11] found more reasonable lifetimes, 
smaller by one or two orders of magnitude than the ones reported in previous 
works, a vast amount of literature is still based on Englander model. In this 
context, the aim of this work is to analyze in depth part of the literature related 
to the work of Englander et al., providing new results that give insight into 
a number of important questions. Specifically, we will study the relation be- 
tween the dynamics of sG solitons and the position of promoters in the genome 
of the bacteriophage T7. This line of work began with Salerno [12,13,14] at 
the beginning of the 90 's and was subsequently continued in several works 
[15,16,17,18,19]. We stress that this is a very important issue: Indeed, if the 
Englander model behaviour could be connected to functionally relevant posi- 
tions in the sequence, it would provide a cheap and efficient tool for genomics. 
Although claims in this direction have been recently presented [19], the main 
result of the present work is that, unfortunately, such a connection cannot be 



2 



substantiated. 



The structure of the paper is as follows. In section 2 we discuss the method- 
ology and the results of the two first papers about this issue [12,13] in terms 
of the effective potential introduced by Salerno in collaboration with Kivshar 
in [14]. In section 3 we describe the main features of the promoters of the 
T7 genome, and analyze the simulation results of the work of Lennholm and 
Hornquist [16] in terms of the effective potential. In section 4 we discuss recent 
work about breathers in the sG model [19]. Finally, section 5 concludes the 
paper by summarizing our main results and their implications. 



2 Early work on T7: Ai, Aq and A^ promoters 

More than a quarter of a century ago, Englander and coworkers [7] introduced 
solitonic excitations into the DNA world as an initial step towards understand- 
ing the stability of open segments of DNA molecules [10]. They suggested the 
well-known sG model, that describes the dynamics of a line of pendula in a 
vertical gravitational field with torsional spring coupling between units, as an 
effective description of DNA molecules. In this way, the double-helix is ap- 
proximated by two parallel rods on which pendula (base pairs) are attached, 
and bonding to the opposite base is represented by a "gravitational" potential 
of each pendulum. Calling 0j the twist angle of the i-ih base, this model has 
static soliton (kink) solutions given by 



valid for a <^ 1, where the contimmm approximation applies. In equation (1), 
a is a dimensionless parameter representing the parameters of the model, and 
acts as an effective discretization parameter of the continuum sG problem. In 
spite of such a great oversimplification of the real problem, the model contained 
the main feature of breaking a bond around = 0. In addition to this, the 
results were consistent with available data [10] although Englander et al. were 
aware of the lack of evidences of solitonic excitations. 

Salerno, in his pioneering and interesting work [12], tried to find a relation 
between relevant sites in the T7 genome and the dynamics of sG kinks moving 
along the inhomogenous DNA sequence under study. The main difference with 
respect to previous works was the introduction of the inhomogeneity of the 
sequence in the model. To do so, he took the static kink solution (1), with 
center at no, and used it as initial condition of the equations of motion of the 
discrete, inhomogeneous sG (or Englander) model, 

0j = - 20i + - Qi sin 0j, (2) 




(1) 
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Qi being the parameter that carries aU the information of the sequence under 
study. It is defined as = f3Xi/K, where K is the torsional spring constant 
between consecutive bases, /? is the energy of a hydrogen bond and Aj is 
the number of hydrogen bonds in a base pair, which is Aj = 2 for AT base 
pairs and A^ = 3 for CG base pairs. Considered as a discrete version of the 
continuum sG equation, the effective discretization of the lattice used in [12] 
was a = q^f"^, where q = j^J2iLiQi being the number of bases of the 
sequence). This value is around a ~ 0.07, which is small enough to avoid 
spurious discretization effects when numerically integrating Eq. (2). In fact, 
taking Eq. (1) as an Ansatz in Eq. (2) was a good choice, as the kink is a 
very robust object even in inhomogeneous sequences and its center can be 
well defined by interpolating the position where (p — tt [17,18]. 

Once the model was defined, Salerno built a sequence {qi} to introduce it in 
(2). He was interested in the genomic sequence of the T7 Ai promoter but, 
instead of using the original DNA sequence, he built a "synthetic" one from 
the original. We will review all the details of this process as this will be the key 
to understand the results of [12]. He took a sequence S of 168 bases containing 
the so-called Ai promoter (further details on T7 promoters will be given in 
the next section) which corresponds to base pairs (BP) from 378 to 545 of 
the actual T7 genomic sequence, and built a longer sequence of 1000 bases, 
that we will call S', to prevent the influence of boundary conditions on the 
dynamics of kinks: 

S' = ^(1, 5) + 8S{1, 50) + ^(1, 168) + 15^(141, 168) + 5(162, 168). (3) 

In this way, the 168 bases sequence S would remain in the center of the 
new sequence S'. with the transcription start site located in BP 526 and the 
promoter sequence going from BP 509 to BP 531, far from the limits of the 
lattice. Therefore, reflective boundary conditions could be safely used in the 
numerical simulations. We will return to this issue when discussing the results. 

As was known that the RNA polymerase could bind to DNA in the region 
of S going from BP 51 to BP 140 (going from BP 455 to BP 545 in S'), the 
expectation was that this region should be dynamically active. Hence, in [12] 
several integrations of Eq. (2) were carried out with the initial position of the 
static kink in a variety of sites inside the promoter region and the behaviour 
of the kinks was studied as a function of their starting position. The results 
were the following: For initial positions in S' from BP 415 to BP 505 the 
kink remained static or with small oscillations around the starting point. For 
BP 510, the kink acquired a velocity v — 0.18 towards the left, was reflected 
without loss of energy at the left boundary and reflected again at the promoter 
region with velocity v = 0.18. This behaviour was enhanced when the initial 
position was increased from BP 510 to BP 535, where the kink also reached the 
maximum velocity v — 0.3. Beyond this point this dynamical behaviour was 
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drastically reduced. For BP 540 the kink acquired a small velocity (?' ~ 0.08) 
towards the right, and for BP 555 the kink simply remained at rest. The 
dynamics of a kink with initial velocity v = 0.3 towards the left was also 
studied, starting from BP 900; it was found that the sohton was accelerated 
when it traveled from right to left through the central region, then reflected 
at the left end of the sequence and decelerated when traveled in the opposite 
direction. It was concluded in [12] that these results showed the existence of 
a dynamically "active" region going from BP 510 to BP 540 inside the T7 Ai 
promoter that could explain the functioning of DNA promoters as energetic 
activators of the RNA polymerase transport process. 

In a subsequent paper, Salerno and Kivshar [14] introduced the effective po- 
tential in order to explain the behaviour of these objects when moving in a 
inhomogenous sequence. The idea is that kink robustness allows to approxi- 
mate their dynamics, even though they are extended objects, as if they were 
point-like particules moving along a one-dimensional potential, given by 

. . _ Em(g + gm)sech^(a(m - n)) 
2E^sech^(a(m-n)) ' 



Recently this approach has been shown to give good results for Fibonacci 
[15] and DNA sequences other than the T7 one [17,18]. By "good results" 
we mean that the dynamics of the kink in Eq. (2) and that of the particle 
in the effective potential (4) can by aligned, in the sense that trajectories 
are semiquantitatively similar, equilibrium points for the kink correspond to 
minima of the potential, and so on. This was also the case with the effective 
potential introduced in [14]: This paper reported the agreement of the direction 
of motion of the kinks according to the effective potential curve corresponding 
to the sequence S', plotted from BP 425 to BP 605 (see Fig. 1). As can be seen 
from the figure, there is indeed a good correspondence between the effective 
potential and the simulation results summarized above. 

However, a more detailed analysis shows that this correspondence is not enough 
to establish a relation between DNA promoters and dynamically "active" re- 
gions. In Fig. 2a it is plotted the effective potential Ves{n) (taking T4ff(500) 
as the origin of energies) for sequence S' and for the original T7 genome se- 
quence. From this figure we immediately observe, on the one hand, that the 
positions of the peaks and the wells of the effective potential of sequence S" 
explain very well the above kink dynamics results reported in [12] in terms 
of a point-like particle; and, on the other hand, that the effective potential of 
the original T7 genome far from the Ai promoter is very different from the 
one of the S' sequence, and hence the dynamics of kinks must be different, 
too. Fig. 2b shows the dynamics of two kinks on the two sequences, S' and 
the original T7 sequence. It is clear that the dynamics behaviour of both se- 
quences is largely different: for instance, in the true potential BP 510 should 
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Fig. 1. Effective potential for a = 0.07 of sequence S' from BP 425 to BP 605. The 
only difference with the one represented in [14] is that, in the latter, an additional 
average of the potential over a distance equal to q "^1'^ was made. 

not be regarded as an active site whereas BP 680 should be regarded so. The 
comparison of the trajectories with those obtained from the effective potential 
confirms the validity of this potential to describe the dynamics of kinks (al- 
lowing for a difference in time scales, as in the point-like particle description 
time units are arbitrary). This means that the effective potential is a correct 
description for both the real sequence and 5", and therefore the differences 
between both of them are not an artifact of this approximation. These differ- 
ences between the two potentials come from the periodic sequences introduced 
in (3), 85(1, 50) and 155'(141, 168), adjacent to the 168 nucleotide S sequence. 
The AT/CG content in the periodic sequences has an average value around 
which the effective potential of these sites oscillates. As further evidence of 
the influence of the ends of the sequence, in Fig. 3 we show the effective 
potential of sequences 5', S'^ and S'^, with S{ = A05 A + S{1, 168) + 427 A 
and S'2 = 405C + 5(1, 168) + 427 C, where N A {N C) means N consecutive 
sites with nucleotide A (C). The effective potentials for kinks moving along 
S[ and 5*2 will lead to a very different dynamics from that described in [12] 
and reported here, although they all have the same central sequence of 168 
nucleotides. 

In [13] the same methodology developed in [12] was used to analyze another 
two T7 promoters, namely Aq (also called D) and A3, and similar results were 
obtained (see Table 1). Figs. 4a and 4b show the effective potential of the syn- 
thetic sequences built by Salerno from the genomic 168 nucleotide sequences 
and the effective potentials of the real T7 sequences around the promoters. 
Again, the effective potentials of the synthetic sequences describe the results 
of the dynamics summarized in Table 1, but differ from the effective poten- 
tials of the real genomic sequences, yielding different dynamics. For instance, 
according to Fig. 4a, a kink starting from BP 245 in the real genomic sequence 
around Aq promoter would reach the right end of the sequence, instead of oscil- 
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a) 





Fig. 2. (a) Effective potential for a = 0.07 around the T7 Ai promoter for the 
synthetic sequence S' (sohd hne) and for the original genome sequence (dashed 
line). The potential of the original sequence has been shifted in the horizontal and 
vertical axis to make it coincide with the one of the 168 bases sequence S in S'. (b) 
Dynamics of the center of two kinks (calculated by interpolating the position where 
(j) = tt), moving along S' sequence (solid lines) and the real T7 genome sequence 
(dashed lines), starting from BP 510 and BP 680. Inset: point-like particle dynamics 
starting from the same sites and moving according to the corresponding effective 
potentials of (a). Although the dynamics is scaled in time with respect to the kink 
dynamics, the trajectories of each pair particle/kink are the same. 

lating around the initial starting position, as it would do in the corresponding 
S' sequence. 

We note that in [13] it was argued that, as the initial static soliton was always 
well inside the original 168 base sequences, then the flanking regions used to 
prolog the chain played no role in the dynamical effects described. However, 
we have just shown how important they are when the kink moves towards 
them. Therefore, wc arc forced to conclude that the results in [12,13] are 
highly dependent on the construction of sequences S', and that when the 
original T7 genome sequence is used instead then the promoter regions cannot 
be considered "active" or "special" regions anymore. As we have seen, other 
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Fig. 3. Relevance of the end parts of the sequence: Effective potential for the 
synthetic sequences S', S[ and S2 (see text for definitions). 



Promoter 


BP region 


Response 




Prom 510 to 535 
540 


Leftward propagation, strongest at 535. 
Small velocity towards the right. 


Ao (or D) 


Prom 530 to 540 

From 543 to 555 


Leftward propagation, strongest at 540. 
Rightward propagation, strongest at 543. 


A3 


From 435 to 460 


Rightward propagation. 



Table 1 

Summary of the dynamical results for kinks moving in the synthetic sequences S' 
obtained from Ai, Aq (also called D) and ^3 promoters in [13]. 

regions close to, but different from, the promoters may be even more "active" 
in the sense of inducing kink motion; conversely, some active regions in the 
synthetic sequence lose this character in the real genome. 




3 Subsequent developments: full T7 genome 



Following the interesting proposal of Salerno, namely the putative relation 
between T7 promoters and the dynamics of solitons moving along inhomoge- 
neous genomic sequences, further research intended to shed further light on 
this question [16]. The main contribution of this sequel is that the sequences 
used were real genomic sequences and, in addition, that the whole T7 genome 
was studied. 

In the research reported in [16], Lennholm and Hornquist measured the max- 
imum distance (in either direction) reached by initially static kinks starting 
from each of the sites of the whole sequence of the T7 genome. They also took 
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a) 





Fig. 4. (a) Efective potentials for a = 0.07 of the synthetic sequence S' (solid line) 
and for the original genome sequence (dashed line) around T7 Aq (also called D) 
promoter. The sequence used in [13] corresponds to the transcription strand (this 
promoter activates transcription leftwards, and therefore it uses the complementary 
strand of the sequence usually showed [20,21]) in the transcription order. Therefore, 
the original sequence has been written in reverse order in order to obtain its poten- 
tial, and then shifted in the horizontal and vertical axis to make it coincide with 
the potential of the S sequence in the synthetic S' sequence, (b) Efective potentials 
for a = 0.07 of the synthetic sequence S" (solid line) and for the original genome 
sequence (dashed line) around T7 promoter. The potential of the original T7 
sequence has been shifted in order to make it coincide with the potential of the S 
sequence in the synthetic S' sequence. 

the 24 promoters of the T7 genome (except the first and the last ones to avoid 
boundary effects), studied the results obtained for positions going from -4 to 
-1 of each promoter and compared these results with the results of the whole 
genome. The aim of this analysis was to find whether the RNA polymerase 
melting region (the one going from -4 to -1 in each promoter) acts as a dynam- 
ically "active" region as proposed by Salerno, or behaves in the same way as 
the rest of the nucleotides of the genome. In this respect, they did not found 
relevant differences (see Fig. 1 of [16]). However, for every promoter they in- 
vestigated the activity of the first n base pairs which are transcribed by RNA 
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polymerase and found that, for n = 20, the studied regions are more active 
than average with a significance of more than five standard deviations (see Fig. 
2 of [16]). They did not give any biological interpretation of that results but, 
in their conclusions, they suggested that a more quantitative relation between 
kink motion and the effective potential should be estabhshed. 

As aheady mentioned, this paper as well as our previous work [15,17,18] has 
proven the agreement between kink dynamics and effective potential. There- 
fore, we can now study the whole T7 genome in terms of this tool. To this 
end, we will review some of the properties of the T7 promoters in order to 
set a methodology in the study of the effective potential in these regions. The 
T7 phage genome is one of the most studied genomes since the whole genome 
sequence was found in 1983 [20], and few changes in the sequence have been 
reported since then [21]. The reproductive cycle of the T7 phage is closely 
hnked to the promoter and gene distribution in the genome. When the T7 
RNA is injected inside a bacteria, like E. coli, the bacterial RNA polymerase 
starts to produce mRNAs induced by three major promoters from the early 
region (or class 1 region) Ai, A2 and ^3. A fourth major E. coli promoter, Aq 
(also called D), that would direct transcription leftward, and several minor 
E. coli promoters function in vitro but have no known in vivo function. Once 
the T7 phage has its own transcription machinery, late mRNAs are produced 
by 15 promoters for T7 RNA polymerase distributed across the right-most 
85% of the DNA (divided in class 11 and class 111 region). There are also two 
T7 promoters associated with possible origins of replication at the left and 
right ends of T7 DNA. The 23 base-pair consensus sequence for T7 promoters 
stretches from -17 to -|-6, where +1 is the transcription start site. This means 
that the nucleotides of the promoters are highly correlated in these sites (al- 
though sometimes they are not strictly the same), but not in the rest of the 
sequence (we will come back to consensus sequences below). 

We can now go back to our main aim: We want to find out whether or not 
there is some kind of pattern in the effective potential, or a set of properties to 
be applied to all the promoters in the T7 genome that allow their identification 
among the rest of the genome. To this end, we must keep in mind that the 
effective potential on each site [see Eq. (4)] is just a weighted average of 
the sequence around the site, with weight function sech^(an). We can obtain 
an estimation of the resolution of the effective potential reading frame by 
noticing that an error of about 10% in the effective potential is introduced 
when truncating the sum in the weighted average (4) in ±An = 1.5a^^ around 
each n. If we consider that, for further sites, the contribution of the to 
the weighted average is negligible, then the number of sites averaged when 
computing the effective potential on each site goes as An ^ 3a~^. This means 
that, for a ~ 0.07 (which is the approximate value of the discretization as 
explained in section 2), An is about 40, a much lower resolution than the 
one needed to recognize the 23 base-pair consensus sequence in the effective 
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Fig. 5. Effective potentials for a = 0.07 around the transcription start site of: (a) 
Early ^i, A2 and E. coli promoters; (b) Late (t)i.iA-, <t>i.iB-, <t>i.z-, 01.5 and T7 
promoters; (c) Late (/>2.5, ^3.8, (pic, 04.3 and 04.7 T7 promoters; (d) Late 06.5) 09) 
010) 013 and 017 T7 promoters. Origins are referred to the transcription start site 
in all cases. 

potential. Therefore, we conclude that the kink is too wide to allow us to check 
that the same curve describes the effective potential of different promoters, as 
was suggested in [12,13,16]. On the other hand, we can increase the resolution 
of the effective potential by increasing the discretization a until reaching An = 
1. We could find then the consensus sequence repeated in the effective potential 
around each promoter, but that would not give more information than the 
consensus sequence itself, and the effective potential would not be useful from 
a genomic point of view. 

To go beyond the previous theoretical discussion, we have computed the ef- 
fective potential for most of the T7 promoters for a = 0.07. In Fig. 5 we show 
the effective potential of the three major E. coli RNA polymerase promoters 
(early promoters) and the T7 RNA polymerase promoters (late promoters) 
of T7 for a = 0.07. Clearly there is no "consensus effective potential" that 
appears in all (neither in most) of the promoters. If we were looking for more 
subtle properties that might enclose all the 18 promoters or each subset of 
early and late promoters, then we would be led to consider as promoters other 
regions of the T7 genome which are not. As an example, the effective poten- 
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Fig. 6. Effective potential for a = 0.07 in different sites of the sequence, which may 
look like the effective potential of promoters in Fig. 5 but which are not. The origin 
of each sequence corresponds to the number in the legend box. 



tials around some other regions that are not promoters are plotted in Fig. 6, 
and it is shown how alike they are to the ones of Fig. 5. Hence, we believe 
that the effective potential of kinks, and therefore the dynamics of kinks in the 
inhomogenous sG model cannot explain the initiation in the promoters of the 
transcription process in the Tl phage and, probably, in any other organism. 



We now turn to the work in [16] in order to understand the difference with the 
conclusions reported there. The research in [16] is certainly interesting because, 
as we already said, it is the first time that the whole genomic sequence of the 
Tl phage is taken into account. However, we believe that their methodology 
is not appropriate for the case under study, as the statistical analysis of kink 
dynamics does not give conclusive results. For instance, a graph with the 
furthest position reached in the sequence in terms of the initial position from 
which the kink started to move would have yielded different results from the 
ones reported in [12,13] and the work would have been more conclusive. In 
addition, we note that the direction of motion of the kinks was not recorded 
and therefore it cannot be assessed whether or not the "activity" of those 
regions agrees with the transcription direction. We therefore conclude that an 
individual study of each promoter is needed if functionally relevant places are 
to be found. This individual study is what we have presented here and we 
believe that the conclusion is clear: The effective potential shows no signature 
of the promoters. Having verified this, in the next section we show that, if a 
detailed study of promoters is done, it must be over all the promoters of the 
phage in order to be conclusive. 
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4 sG breathers 

In this last section, we consider yet another recent approach to sG soliton 
dynamics that followed the steps of [12,13,14] but using breathers instead of 
kinks [19]. In this case, the author studied RNA polymerase recognition of 
specific binding sites by comparing breathers to localised deformations of the 
DNA duplex when the RNA polymerase slides on its major groove. To this 
end he constructed a potential for sG breathers following the steps of [14], but 
using as Ansatz a discretized breather, given by 



where is related to the intrinsic frecuency of the breather. The potential 
obtained in [19] is the following: 



with a — q^/^ sin jj, ~ 0.04. The main difference between this potential and the 
one obtained in [14] is that breathers defined by (5) are not static solutions of 
the sG model. This means, on the one hand, that the kinetic term of the sG 
Hamiltonian (that we do not write here) has two extra terms when deriving 
4>hv,n-no{t) with respcct to time and squaring it, and that the potential term 
obtained depends explicitly on time. These problems may be solved by moving 
one of the extra terms from the kinetic to the potential term, and then inte- 
grating in time over a period. Another important difference with [14] is that 
kinks are very robust objects that behave very well in the discrete sG model, 
even for inhomogeneous sequences, and that is why they can be expressed 
in terms of its center. Breathers, however, are very unstable in the homoge- 
neous, discrete sG model, and it is to be expected that they are even more so 
on inhomogeneous sequences. Therefore, we do not think that the potential 
for breathers may describe accurately and/or for long times the dynamics of 
an initially static breather. The construction of the sG potential for breathers 
is, therefore, not as straightforward as the one for kinks in [14], and this must 
be taken into account when analyzing the results. 

After the breather potential (6) was constructed in [19], it was used to analyze 
the early region of the T7 genome and a particular region of the T5 phage. It 

was suggested, among other things, that there is a correlation between deep 
wells of (6) and promoters in the early region and class III T7 promoters. 
Another relation between deep wells and transcription terminators was also 
suggested. We can now apply the results obtained in section 3 to the claims 




(5) 




+ Im) cosh(a(m — n)) 
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Fig. 7. Potential Vjji- (6) for breathers (solid line) and Vqs (4) for kinks (dashed line) 
for the T7 genome region going from BP 500 to BP 7000 for a = 0.04 and /x = vr/G 
(which corresponds to the values used in the figures of [19]). The scale of Fbr has 
been divided by 50 in order to make it fit with the scale of Ves- No vertical shifting 
was made in any potential. This region of the genome contains seven promoters. 

in [19] by noticing how alike the weight functions of (4) and (6) they are 
for tanyU < 1 (which is the case, according to [19]). Therefore, the structure 
of peaks and wells is very similar in both cases, as shown in Fig. 7, and 
we can thus extend the conclusions of section 3: Even if the potential for 
sG breathers works in a similar way as the effective potential for sG kinks, 
it is not enough to explain the transcription process of RNA polymerase. 
For instance, as shown in [19], there are deep wells in the potential for sG 
breathers near some of the promoters of the T7 genome. However, there are 
other promoters (class II) which are not near any deep well, and also deep wells 
which are not near promoters (like the ones found in [19] near terminators). 
When loosing the constraints in order to take into account not so deep wells 
which are near promoters, then many other wells far from promoters should be 
considered, too. We therefore conclude that there is no special characteristic 
in the potential for breathers that allow the identification of promoters from 
the rest of the genome, by simply looking at the effective potential. 



5 Conclusions 

The Englander model was introduced in [7] to explain long life times on open 
states of DNA duplexes [10] by means of the well known nonlinear sG model. 
Subsequently, research on the sG model led to suggestions of a relation between 
functionally relevant positions in the sequence with dynamical properties of 
sG solitons. In section 2 we showed that the results of kink dynamics moving 
along inhomogenous sequences developed in [12,13] depend highly on the se- 
quence under study. In order to achieve this we used the effective potential 
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for sG kinks moving on inhomogeneous sequences, introduced in [14]. Ap- 
plied to the sequences used in [12,13] and to the corresponding real genomic 
sequences of the T7 phage, we observed important differences between both 
potentials. Differences came from the end parts of the analyzed sequences, 
which were a priori assumed not to have any role. With these findings and 
taking into account the good results already obtained for the particle-like ap- 
proximation of sG kinks moving along inhomogeneous sequences [15,17,18], 
we concluded that early promoter regions of the T7 genome cannot be con- 
sidered dinamically "active". In section 3, addressing the question posed in 
[16], we searched for patterns that could differentiate the dynamics of kinks 
starting from T7 promoters from kinks starting from the rest of the genomic 
sequence. Again, we used the effective potential, this time applied to the whole 
genomic sequence of the T7 phage. Comparing the curves obtained for the 18 
major promoters of the phage among them and also with other non-promoter 
regions led us to think that there was no special properties of the effective 
potential around promoter regions, and therefore that the dynamics of kinks 
moving from these regions was the same as in other genomic regions. Finally, 
in section 4 we applied the same arguments and also reviewed the problems of 
the potential for breathers in order to demonstrate that the potential for sG 
breathers obtained in [19] can not be used to differentiate promoter regions in 
the genomic sequence. From all this evidence, we can confidently claim that 
neither the sG model nor its description in terms of the effective potential 
give hints about functionally relevant sites of DNA sequences. We stress that 
this claim is about the sG model and the dynamics of its solitons. Statistical 
mechanics approaches are also being studied with some success [25,22,23,24] 
but that is a completely different approach. 

It is important to extend this discussion to include its biological implications. 
The relation of deep wells and functioning sites of DNA can now be discussed 
in terms of properties of bacterial promoters [26,27]. Bacterial RNA poly- 
merase is a multisubunit complex. A detachable subunit, called a factor, is 
responsible for reading the promoters, which are the signals enconded in the 
DNA that tell it where to begin transcribing. Most bacteria contain multiple 
a factors that enable the recognition of different sets of promoters. A compar- 
ison of many different bacterial promoters reveals that they are heterogeneous 
in the DNA sequence. However, they all contain related sequences that reflect 
on mechanical and electrostatic properties of the DNA double helix that are 
recognized by the o factor. These common features are often summarized in 
the form of a consensus sequence, which serves as a summary or "average" 
of a large number of individual nucleotide sequences. The precise sequence 
determines the strength (or number of initiation events per unit time) of each 
promoter. However, although the a factor is needed in the transcription initia- 
tion, other elements can bind to RNA polymerase to regulate the transcription 
of specific promoters, like the a. subunits. Another important group of proteins 
that recognizes and binds to promoters are transcription factors. These pro- 
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teins act as regulatory elements that control transcription initiation and bind 
to specific sequences. This summary of regulatory elements of transcription 
initiation in procaryotes reveals the intrinsic complexity of the sequences of 
promoters. In the case of eukaryotic regulation the complexity increases too 
much to try to summarize it in this paragraph, and we will just refer to the 
counter-intuitive fact that specific CG-rich promoters that arc found in yeast 
[28]. Therefore, we conclude that, although deep wells in the potential for sG 
kinks or breathers are correlated with AT-rich regions, they are not enough 
to recognize such complex structures as promoters, and it is only natural that 
the dynamics of these simple excitations cannot capture the mechanisms of 
promoter function. 
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